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Given F: [a, b]^ —> [a, b] and a nonconstant Xq withP(Xo £ [a, b]) =1, 
define the hierarchical sequence of random variables {X„}„>o by 
Xn+i = F{Xn,i, ■ ■ ■ ,Xn,k), where Xn,i are i.i.d. as X^. Such sequences 
arise from hierarchical structures which have been extensively stud¬ 
ied in the physics literature to model, for example, the conductivity 
of a random medium. Under an averaging and smoothness condition 
on nontrivial F, an upper bound of the form Cy" for 0 < 7 < 1 is 
obtained on the Wasserstein distance between the standardized dis¬ 
tribution of Xn and the normal. The results apply, for instance, to 
random resistor networks and, introducing the notion of strict aver¬ 
aging, to hierarchical sequences generated by certain compositions. 

As an illustration, upper bounds on the rate of convergence to the 
normal are derived for the hierarchical sequence generated by the 
weighted diamond lattice which is shown to exhibit a full range of 
convergence rate behavior. 


1. Introduction. Let A; > 2 be an integer, D C K, Xq a nonconstant ran¬ 
dom variable with P{Xq G P) = 1 and F: T)^ V a given function. We 
consider the accuracy of the normal approximation for the sequence of hier¬ 
archical random variables where 

( 1 ) Xn+i = F(Xn), n>0, 

and X„ = {Xn,i, ■ ■ ■ ,Xn,k)'^ with Xn,i independent, each with distribution 
Xr,. ’ ’ 

Hierarchical variables have been considered extensively in the physics lit¬ 
erature (see [5] and the references therein), in particular to model conduc¬ 
tivity of a random medium. The diamond lattice in particular has been 
considered in [3, 7]. Figure 1 shows the progression of the diamond lattice 
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Fig. 1. The diamond lattice. 


from large to small scale. At the large scale [Figure 1(a)], the system dis¬ 
plays some conductivity along the bond between its top and bottom nodes. 
Inspection on a finer scale reveals the bond actually comprises four smaller 
bonds, each similar to Figure 1(a), connected as shown in Figure 1(b). Fur¬ 
ther inspection of each of the four bonds in Figure 1(b) reveals them to be 
constructed in a self-similar way from bonds at an even smaller level, giving 
the successive diagram Figure 1(c) and so on. 

We assume each bond has a fixed conductivity characteristic w>0 such 
that when a component with conductivity x > 0 is present along the bond the 
net conductivity of the bond is wx. For the diamond lattice as in Figure 1(b), 
we associate conductivities w = {wi,W 2 ,ws,W 4 )'^, numbering from the top 
node and proceeding counterclockwise. If xq = (xo,i, xq, 2 ) 2 : 0 , 3 ; 2 ^ 0 , 4 )''' are the 
conductances of four elements each as in Figure 1(a) which are present along 
the bonds in Figure 1(b), then applying the resistor circuit parallel and series 
combination rules, the conductivity between the top and bottom nodes in 
Figure 1(b) is xi =F{xq), where 


( 2 ) 


F(x) = 


-L 


WlXl W2X2 


-1 


-L 


1x3X3 W4X4 


-1 


The network in Figure 1(c) is constructed from four diamond structures 
similar to Figure 1(b), and endowing each with the same fixed conductivity 
characteristics w, with xi = (xi,i,xi,2,xi,3,xi,4)''' and each xi,* determined 
in the same manner as xi, the conductance between the top and bottom 
nodes in Figure 1(c) is X 2 = T(xi), and so forth. 

In general, a function F: T>^ —> T> and a distribution on Xq such that 
P{Xq G P) = 1 determines a sequence of distributions through Xn+i = 
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F(X„), where X„ = {Xn,i, • ■ •, Xn,k)'^ with Xn^t independent, each with dis¬ 
tribution Xn. Conditions on F which imply the weak law 

(3) A c 

have been considered by various authors. Shneiberg [ 8 ] proves that (3) holds 
if 2? = [a, b] and F is continuous, monotonically increasing, positively homo¬ 
geneous, convex and satisfies the normalization condition i^(lfc) = 1 , where 
Ifc is the vector of all ones in Li and Rogers in [5] provide rather weak 
conditions under which (3) holds for closed V C (— 00 , 00 ). See also [4, 11, 12] 
for an extension of the model to random F and applications of hierarchical 
structures to computer science. 

Letting Xq have mean c and variance cr^, the classical central limit theo¬ 
rem can be set in the framework of hierarchical sequences by letting 

(4) F{xi,X2) = ^ixi+X2), 
which gives in distribution 

_ -^ 0,1 +- 1 - -^ 0 , 2 " 

Va. 77 - • 

2 " 

Hence, X„ c, and since Xn is an average of = 2” i.i.d. variables with 
finite variance, 

= 4a2(0,1). 

Under some higher-order moment conditions one would expect a bound on 
the Wasserstein distance d between Wn and to the standard normal Af to 
decay at rate that is, with 7 = 

(5) d(lUn,A2)<C'7^ 

The function (4), and (2) with T(l 4 ) = 1, are examples of averaging func¬ 
tions, that is, functions F —> V which satisfy the following three prop¬ 
erties on their domain: 

1 . miuj Xj < T(x) < maxj Xj. 

2. T(x) < F{y) whenever Xj <yi. 

3. For all X < y and for any two distinct indices R 7 ^ ^ 2 , there exist Xj G 
{x, y}, i = 1,... ,k, such that X 7 = x, Xjj = y and x < F(x) < y. 

We note that the function F (x) = miuj Xi satishes the hrst two properties 
but not the third, and gives rise to nonnormal limiting behavior. We will 
call F(x) a scaled averaging function if F(x)/T(lfc) is averaging. 

Normal limits in [13] are proved for the sequences Xn determined by the 
recursion ( 1 ) when the function F(x) is averaging by showing that such 




4 


L. GOLDSTEIN 


recursions can be treated as the approximate linear recursion around the 
mean Cn = EXn with small perturbation 

(6) — Qjj • X-fi -|- Zm n ^ 0, 

where = F'(c„), c„ = (c^,... ,Cn)'^ G and F' is the gradient of F. In 
Section 3 we prove Theorem 3.1, which gives the exponential bound (5) 
for the distance to the normal for sequences generated by the approximate 
linear recursion ( 6 ) under moment Conditions 3.1 and 3.2, which guarantee 
that Zn is small relative to X^- 

In Section 4 we prove Theorem 1.1, which shows that the normal conver¬ 
gence of the hierarchical sequence Xn holds with the exponential bound (5) 
under mild conditions, and specifies 7 in an explicit range. Theorem 1.1 is 
proved by invoking Theorem 3.1 after showing that the required moment 
conditions are satisfied for averaging functions. In particular, the higher- 
order moment Condition 3.2 used to prove the upper bound (5) is satished 
under the same averaging assumption on F nsed in [13] to guarantee Condi¬ 
tion 3.1 for convergence to the normal. The condition in Theorem 1.1 that 
the gradient a = F'{c) of F at the limiting value c not be a scalar multiple 
of a standard basis vector rules out trivial cases such as F{xi,X 2 } = xi, for 
which normal limits are not valid. 


Theorem 1.1. Let Xq he a nonconstant random variable with P{Xo G 
[ 0 , 6 ]) = 1 and Xn given by (1) with F :[a,b]^ —> [a,b], twice continuously 
differentiable. Suppose F is averaging, or scaled averaging and homogeneous, 
and that Xn c, with a = F'{c) not a scalar multiple of a standard basis 
vector. Then with Wn = (Xn — c„)/\/Var(X„) and Af a standard normal 
variable, for all 7 G (</?, 1) there exists C such that 


d{Wn,M)<C-i^, 

where 

ELi \a,f 

(EtilaiP)’/"’ 

a positive number strictly less than 1. The value gp achieves a minimum of 
1 / Vk if and only if the components of cx are equal. 


At stage n there are N = kT variables, so achieving the rate 7 ” for 7 to just 
within its minimum value 1/Vk corresponds to the rate for every 

e > 0. On the other hand, when a is close to a standard basis vector, (/? is 
close to 1, and the rate y"' is slow. This is anticipated, as for the hierarchical 
sequence generated using the function, say F{xi,X 2 ) = (1 — £)xi -|- 8 X 2 for 
small e > 0 , convergence to the normal will be slow. 
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In Section 5, Theorem 1.1 is applied to the hierarchical variables gener¬ 
ated by the diamond lattice conductivity function (2). In (47) the value 
determining the range of 7 in (5) for the rate of convergence to the normal 
is given as an explicit function of the weights w; for the diamond lattice all 
rates N~^ for 9 G (0,1/2) are exhibited. Interestingly, there appears to be 
no such formula, simple or otherwise, for the limiting mean or variance of 
the sequence 

We prove our results using Stein’s method (see, e.g., [9]) in conjunction 
with the zero bias coupling of [ 1 ], derived from similar use of the size bias 
coupling in [2]. Let Z he a mean zero, variance normal variate and Nh = 
Eh{Z/a) for a test function h. Given a mean c, variance cr^ random vari¬ 
able X, Stein’s method, as typically applied, estimates Eh{{X — c)/a) — Nh 
using the auxiliary function / which is the bounded solution to 

(8) h{w / a) — Nh = f'{w) — wf{w). 

In [1] it is shown that for any mean zero variance cr^ random variable 
W there exists W* such that, for all absolutely continuous / for which 
EWf{W) exists, 

(9) EWf{W) = a^Ef{W*), 

and that W is normal if and only \i W = W*. Hence, the distance from 
W to the normal can be expressed in a distance d from W to W*. The 
variable W* is termed the VL-zero biased distribution due to parallels with 
size biasing. In both size biasing and zero biasing, a sum of independent 
variables is biased by choosing a summand at random and replacing it with 
its biased version. In size biasing the variables must be nonnegative, and one 
is chosen with probability proportional to its expectation. In zero biasing the 
variables are mean zero, and one is chosen with probability proportional to 
its variance. The coupling construction for zero biasing just stated appears 
in [1] and is presented formally in Section 3; it provides the key in the 
proof of Lemma 2.2. To see how the zero-bias coupling is used in the Stein 
equation, let / and h be related through ( 8 ). Evaluating ( 8 ) at a mean zero, 
variance cr^ variable W, taking expectation and using (9), we obtain 

(10) a\Ef\W) - Ef{W*)\ = E[a^f'{W) - Wf{W)] = Eh{W/a) - Nh. 

For d the Wasserstein distance (also known as the Dudley, Fortet-Mourier 
or Kantarovich distance). Lemma 2.1 applies (10) to show the following 
strong connection between normal approximation and the distance between 
the W and W* distributions as measured by d. With J\f a mean zero normal 
variable with the same variance as W, 

(11) d{W,N)<2d{W,W*). 
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Hence, bounds on the distance between W and W* can be used to bound 
the distance from W to the normal. 

We recall that, with 

( 12 ) £ = {/i :M —> M: \ h{y) — h{x)\ < \y — x\}, 

the Wasserstein distance d{Y,X) between variables Y and X on M is given 
by 

diY,X) = sup\E{h{Y) - h{X))\, 
h&C 

or equivalently, with 

(13) = {f ■ f absolutely continuous, /(O) = f'{0) = 0, /' G £} 
we have 

(14) d{Y,X) = sup\Eif'{Y)-f{X))\. 

For f € E, certain growth restrictions are implied on h of ( 8 ) for this /. In 
Theorem 3.1 these restrictions are used to compute a bound on d{Wn,W*), 
which in turn is used to bound d(Wn,AA) by (11). This argument, where / 
is taken as given and then h determined in terms of / by (8), is reversed 
from the way Stein’s method is typically applied, where h is the function 
of interest and / has only an auxiliary role as the solution of ( 8 ) for the 
given h. 

For the application of Theorem 1.1, it is necessary to verify the function 
T(x) in (1) is averaging. Proposition 3 of [13] shows that the effective con¬ 
ductance of a resistor network is an averaging function of the conductances 
of its individual components. Theorem 1.2, proved in Section 6 , provides 
an additional source of averaging functions to which Theorem 1.1 may be 
applied by introducing the notion of strict averaging and showing that it is 
preserved under certain compositions. 

We say F is strictly averaging if strict inequality holds in property 1 
when min* Xi < maxj Xj, and in property 2 when Xi < yi for some i. Property 
3 is the least intuitive, but is a consequence of a strict version of the first 
two properties; that is, a strictly averaging function is averaging: \i x <y 
and Xi^ = x, Xi^ = y, then any assignment of the values x,y lo the remaining 
coordinates gives x < T(x) < y by the strict form of property 1 , so F satishes 
property 3. 

Theorem 1.2. Let k>l and set Iq = {1,... ,k}. Suppose subsets li C 
Iq, i G /q satisfy Uie/o x G and i ^ Iq let Xj = (xj^,... ,Xj^j ^), 

where = li and ji < ■ ■ ■ < Let (T): [0, oo)l'^'l ^ [0, oo) or) 
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Fi: — > M, i = 0,. .. ,k. If Fq, Fi,... ,Fk are strictly averaging and Fq is 

(positively) homogeneous, then the composition 

Fs(x) = Fo(siFi(xi),... ,SfcFfc(xfc)) 

is strictly averaging for any s for which Fq{s) = 1 and Sj > 0 for all i. If 
Fq, Fi,..., Fk are scaled, strictly averaging and Fq is (positively) homoge¬ 
neous, then 


Fi(x) = Fo(Fi(xi),...,Ffc(xfc)) 
is a scaled strictly averaging function. 

In particular, in the context of resistor networks, two components with 
conductances xi,X 2 in parallel is equivalent to one component with conduc¬ 
tance 


Li(xi,X2) =Xi+X2, 

and in series to one component with conductance 

L_i{xi,X2) = {xf^ +xf^)~^. 

These parallel and series combination rules are the p = l and p = —1 special 
cases, with Wi = l, of the weighted L^’-norm functions 

/ k \ i/p 

Lp (x) = yYliwiXif j , ’w = (u;i,...,'u;fc)''’, E (0,oo), 

which are scaled, strictly averaging and positively homogeneous on [0,oo)^ 
for p> 0 , and on (0, oo]^ for p < 0. 

Though Theorem 1.2 cannot be invoked to subsume the result of [13] that 
every resistor network is strictly averaging in its component conductances 
(e.g., consider the complete graph K 4 ), now suppressing the dependence of 
Lp on w, since T(x) in (2) can be represented as 

F(x) = Li(L_i(xi,X2),T_i(x3,X4)), 

Theorem 1.2 obtains to show that the diamond lattice conductivity function 
is a scaled, strictly averaging function on (0, 00 )^ for any choice of positive 
weights. Moreover, Theorem 1.2 shows the same conclusion holds when the 
resistor parallel Li and series T_i combination rules in this network are 
replaced by, say, L 2 and T- 2 , respectively. 
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2. Zero bias and the Wasserstein distance. The following lemma, of sep¬ 
arate interest, shows how the zero bias coupling of W upper bounds the 
Wasserstein distance to normality. 

Lemma 2.1. Let W be a mean zero, finite variance random variable, 
and let W* have the W-zero bias distribution. Then with d the Wasserstein 
distance, and M a normal variable with the same variance as W, 

d{W,M) <2d{W,W*). 

Proof. Since a-^d{X,Y) = d{a-^X , a-^Y) and a-^W* = (cJ-^W)*, 
we may assume Var(iy) = 1. The dual form of the Wasserstein distance 
gives that 

(15) mf^ElY-Xl = d(Y,X), 

where the infimum, achieved for random variables on M, is taken over all 
pairs (Y,X) on a common space with the given marginals (see [ 6 ]). Take 
W,W* to achieve the infimum d{W,W*). 

For a differentiable test function h and = 1, Stein [10] shows the solu¬ 
tion / of (8) is twice differentiable with ||/^^|| < 2 ||/i'||, where || • || represents 
the supremum norm. Now going from right to left in (10), applying this 
bound and using (15) we have 

\Eh{W) - Nh\ < \\f\\E\W - W*| < 2\\h'\\E\W - W*| = 2\\h'\\diW, W*). 

Functions /i G £ of (12) are absolutely continuous with \\h'\\ < 1, so taking 
supremum over h€ C on the left-hand side completes the proof. □ 


The following results in this section give the prototype of the argument 
used in Section 3 and show how the zero bias coupling can be used to obtain 
the exponential decay of the Wasserstein distance to the normal. 


Proposition 2.1. For a g with A = ||q:|| / 0, for all p>2. 


E 



< 1 , 


with equality if and only if ex is a multiple of a standard basis vector. In the 
case p = 3, yielding p of (7), 

(16) .^<^<1 ^ 

with equality to the upper bound if and only if ex is a multiple of a standard 
basis vector, and equality to the lower bound if and only if \ai\ = \aj\ for all 
i,j. In addition, when a* > 0 with = 1; then 

(17) \<p, 
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with equality if and only a is equal to a standard basis vector. 


Proof. Since |aj|/A < 1 we have \ai\^ ^ 


/XP ^ < 1, yielding 



with equality if and only if for some i we have \ai\ = A, and aj = 0 for all 
j 7 ^ i. By Holder’s inequality with p = 3,q = 3/2, we have 



giving the lower bound (16), with equality if and only if of is proportional 
to 1 for all i. For the claim (17), by considering the inequality between the 
squared mean and variance of a random variable which takes the value ai 
with probability ai, we have ^ with equality if and only if 

the variable is constant. □ 

Lemma 2.2 shows how zero biasing an independent sum behaves like a 
contraction mapping. 

Lemma 2.2. For a c with A = ||q:|| / 0, let 



where Wi are mean zero, variance 1, independent random variables dis¬ 
tributed as W. Then 


d{Y,Y*)<ipd{W,W*) 

with if as in (7), and p <1 if and only if a is not a multiple of a standard 


basis vector. 

Proof. By [1], for any collection Wf with the Wi zero biased distri¬ 
bution independent of Wj, j ^ i, and I a random index independent of all 
other variables with distribution 



the variable 
(18) 


Y* = Y -^{Wi-Wf) 
A 
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has the Y zero biased distribution. Since Wi = W, we may take (W,, W*) = 
{W,W*), with W^W* achieving the infimum in (15). Then 

k I I 

\Y-Y*\=y^^-^\Wi-W*\l{I = i) 

i=l ^ 

and 

i=l \ i=l 

Now using (15) to upper bound d{Y,Y*) by the particular coupling in (18) 
we obtain 

diY,Y*)<E\Y -Y*\=ipE\W -W*\=ipd{W,W*). 

The hnal claim was shown in Proposition 2.1. □ 


"^E\W-W* 


In the classical case, when Y = n normalized sum of 

i.i.d. random variables, applying Lemmas 2.1 and 2.2 with ctj = ^-jy/n gives 
d{Y,M) < 2d{Y,Y*) < 2n-^/^d{W,W*) 0 as n —> oo, yielding a stream¬ 

lined proof of the central limit theorem, complete with a bound in d. 

When the sequence Xn is given by the recursion ( 6 ) with = 0, setting 
A„ = llctnll and = Var(X„) we have < 7^+1 = XnCTn, and we can write ( 6 ) as 

Wn+l=J2^ Wn,i With Wn = ~ . 

An an 


Iterating the bound provided by Lemma 2.2 gives 


d{Wn,W:)< 



d{Wo,WS), 


where 

(19) 


(fn = 




When limsupnV^n = < 1, for any 7 G (<y7,1) there exists C such that for 

all n we have diWniM) < 2d{Wn-,W*) < Cy”. In Section 3 we study the 
situation when Zn is not necessarily zero. 


3. Bounds to the normal for approximately linear recursions. In this 
section we study sequences {Xn}n>o generated by the approximate linear 
recursion ( 6 ), and we present Theorem 3.1, which shows the exponential 
bound (5) holds when the perturbation term Zn is small as reflected in 
the term fdn of (24), and holds in particular under the moment bounds 
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in Conditions 3.1 and 3.2. When is small, Xn+i will be approximately 
equal to ocn • X„, and therefore its variance will be close to 

where An = llocnll) and the ratio (XnCrn)/(Tn+i will be close to 1. Iterating, 
the variance of will grow like a constant C times An_i • • • Aq, so when 
c-n ^ c and ctn —> ct, like C'^A^"'. Condition 3.1 assures that Zn is small 
relative to Xn in that its variance grows at a slower rate. This condition 
was assumed in [13] for deriving a normal limiting law for the standardized 
sequence generated by (6). 

Condition 3.1. The nonzero sequence of vectors ctn £ M^,/c > 2, con¬ 
verges to a, not equal to any multiple of a standard basis vector. For 
A = ||q:||, there exist 0 < di < <52 < 1 and constants Cz, 2 ,Cx ,2 such that, 
for all n, 

Var(Z„) < - 62 )^^, 

Var(X„)>Ci,2A'’"(l-<5i)'”. 

Bounds on the distance between X^ and the normal can be provided 
under the following conditions on the fourth-order moments of X^ and Zn- 


Condition 3.2. There exist da and 5^ G (5i, 1) and constants Cz,a-, Cx,i 
such that 

E{Zn - EZnf < 

E{Xn - EXnf < Cj,,4A^”(l + Js)"” 

and 


( 20 ) 


(3 = max{0i,(/)2} < 1 


where 4 >i 


(1_52)(1 + 53)3 

(l-5l)4 


and (/>2 


( l-h 

U-<5i 


2 


Using Holder’s inequality and Condition 3.2 we may take 


( 21 ) 


1~ ^2 <1 — ^4 <l~5l <1 + 53- 


In particular, f5 <r] for 

( 22 ) 


(1_54)(1 + 53)3 

(l-<5i)4 


Theorem 3.1. Let X^+i = cx^ ■ X„ -|- Z^ with Xn = ||Q!n|| 7 ^ 0 and X„ 
a vector in with i.i.d. components distributed as Xn with mean Cn and 
nonzero variance Set 


Wn = 


Xn Cn 


(Jn 


Yn = 


k 

2=1 


n ,2 


WnA 


(23) 
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and 


(24) Pn = E\Wn+l-Yn\ + ^E\W^^,-Y^ 

If there exist (/3, if) G (0,1)^ such that 


(25) 

and ifn (19) satisfies 

(26) 



limsupy?n = f, 


then with ^ = j3 when ip < (5, and for any 7 G {ip, 1 ) when (3 <p, there exists 
C such that 


d{Wn,Af)<Cj^. 


(27) 


Under Conditions 3.1 and 3.2, (27) holds for all 7 G (max(/3, (/?), 1 ), with /3 
as in ( 20 ), and p = < 1- 

Proof. By Lemma 2.1, it suffices to prove the bound (27) holds for 
d{Wn,WX). Let f eE with E given by (13). Then \ f'{x)\ < l,\f'{x)\ < 
|a^|)|/(a^)| < and for h given by ( 8 ) with cr^ = 1 and the chosen /, 

differentiation of ( 8 ) yields 


h'{w) = f{w) - wf'{w) - f{w), 


and therefore 


\hfw)\<{l + lw^). 


(28) 


Letting = (A„fT„)/crn+i and using (23), write Xn+i = Qn • as 




Now by (28) and the definition of fin in (24) 



From (10) with cr^ = 1, using Var(4^) = 1, 

\Ef{Wn+l) - Ef{W*^,)\ = \Eh{Wn+l) - Nh\ 


= \E{h{Wn+l) - h{Yn) + h{Yn) - Nh)\ 
<fin + \Eh{Yn)-Nh\ 

<fi^ + \Eif{Yn)-nY:))\ 

<fin + d{Yn,Y:) [by (14)] 

<fin + Pnd{Wn, W*) (by Lemma 2.2). 
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Taking the supremum over / G JU on the left-hand side, using (14) again and 
letting dn = d{Wn, W*) we obtain, for all n > 0 , 

dn+l ^ Pn T ^ndn- 


Iteration yields that, for all n,no > 0, 

no+n—l /no+n—1 \ /no+n—1 \ 

(30) driQ+n ^ E n n 

j=no \ i=j^l / \ i=nQ / 

Now suppose the bounds (25) and (26) hold and recall the choice of 7 . 
When ip < f3 take ip G {p, (3) so that p < p < (3 = 'y, when P <p set p G {p, 7 ) 
so that P < p < p < 'j. Then for any B greater than the limsup in (25) there 
exists no such that, for all n > no, 

Pn < BP'^ and pn < p. 


Applying these inequalities in (30) and summing yields, for all n > 0, 


dn+no < BP^^ 
since max(/3, (^) < 7 , (27) follows. 

To prove the final claim it suffices to show that, under Conditions 3.1 and 3.2, 
(25) and (26) hold with /3 < 1 as dehned in (20), and with p = Yl\=i < 

1 . Lemma 6 of [13] gives that the limit as n —> 00 of an/ (Aq • • • A^-i) exists 
in ( 0 ,oo), and therefore 

(31) lim Tn = 1 and lim 

n^oo n^oo (j^ 


Referring to the definition of T„ in (29) and using (31) and Conditions 
3.1 and 3.2, there exist Ctp,Ct,A such that 

\ 2 Var(ZO ^^2 /I 


(E|r„|)^<.Er„^ = Var(T„) = 


(J ri 




) Var(X„ 


< 


1-61 


ET^ 


^n+l) \ 



l-<54y” 

1 ^; 


By independence, a simple bound and Condition 3.2 for the inequality, 

{E\Yn\f<EY^ = \^i:{Yn) = l, 


EY^ < 6 E 




< 6 Ci 


l + ^sA"" 

I- 5 J 


From ( 6 ), with (Tz„ = \/Var(Z„), Un+i < + crz„ and < an+i + 

uz ^; hence with Crp = Ctp we have 


|A7i<T,.j CT^+i| SO \t n 


1 | <Crp 


l-di) 
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Since |rP — 1| < j J kn — Ip j using (21) there are Cr^p such that 

K - 1\ < , p = l,2,.... 

Now considering the first term of f3n of (24), recalling (29), 

E\Wn+l -Yn\= E\{rn - 1)^^ + Tn\ 

/I _ ^ \ n 

< \rn - 1| ^l^nl + E\Tn\ < {Cr,l + ^, 2 ) ( 3 ^ ) > 

which is upper bounded by a constant times (/>”. 

For the second term of (24) we have 

E\Wl^, - Y^\ = E\{rl - l)Y^ + ‘irlY^Tn + Sv^Yr^T^ + r^]. 

Using the triangle inequality, the first term is bounded by a constant times 
as 

\rl - 1\E\Y^\ < \rl - l\{EY^f/^ < 

Since ^ 1 by (31), it suffices to bound the next two terms without the 
factor of r„. Thus, 

E\Y^Tn\ < VeY^ET^ < "(f- 

which is less than a constant times 0” by (21), and finally, 

^|y„r2| < Vey^et^ < cl, ([^)< c’l4</>2, 

E\T^\ < {ET^f/^ < 

Hence (25) holds with the given f3. 

Since cxn —> Q, we have pn Under Condition 3.1, o: is not a scalar 
multiple of a standard basis vector and p <lhy Lemma 2.2. We finish by 
invoking the first part of the theorem. □ 

4. Normal bounds for hierarchical sequences. The following result, ex¬ 
tending Proposition 9 of [13] to higher orders, is used to show that the 
moment bounds of Conditions 3.1 and 3.2 are satisfied under the hypothe¬ 
ses of Theorem 1.1, so that Theorem 3.1 may be invoked. The dependence 
of the constants in (33) and (34) on e is suppressed for notational simplicity. 
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Proposition 4.1. Let the hypotheses of Theorem 1.1 hold. Following 
( 6 ), with Cn = EXn and ctn = F'{cn), define 

(32) Zn = FfX.n) — O-n-Xn- 

Then with cx the limit of cxn and A = ||q;||, for any p>l and e > 0, there 
exist constants Cx,p, Cz,p such that 

(33) E\Zn-EZfiP + foralln>0 

and 

(34) E\Xn-Cn\P <CPx^pi\ + £T^ foralln>0. 


Proof. Expanding F(X„) around c^,, with = F'(c„), 

k 

(35) E(Xn) = E{Cn) + ^ an,iiXn,i - Cn) + R2{Cn,Xn), 

i=l 

where 

^ rl Q'ip 

R2{^m^^n) — 'y ) J (1 t) {c,n Ft(X.n Cn)){Xn^i Cn)(^n,j“ 

Since the second partials of F are continuous on H = [a,b]^, with | 
supremum norm onV, B = 2“^maxjj \\d‘^F/dxidxj\\ < oo, we have 


-Cn)dt. 

• II the 


(36) 


|i? 2 (Cn,X„)| < 5 ^ \(Xn,i - Cn)iXnJ - Cn)\. 
ij=l 


Using (32), (35) and (36), we have, for all p > 1, 

ElZn-EZJP 


= E 


(37) 


-^(^n) ^n-j-1 ^ ^ 0 ^n,i(-^n,i ^n) 


2=1 


— E\F(^Cyi') 


< 2^-1 |^|F(c„) - Cn+l\P + BPE^^\{Xn,i - Cn)iXn,j - C„)|^ 

For the first term of (37), again using (36), 

|F(C„) - Cn+l\P = \F{Cn) - EF{Xn)\P = |E7?2(Cn, X„)|P 
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(38) 


<BPkp(^Ej2{Xn,i-Cn)^'^ 

<BPeP[E{X^-Cnf]P 

< BP^PEiXn - CnfP, 


using Holder’s inequality for the final step. 

Similarly, for the expectation in (37), 

- Cn){XnJ - Cn)!"! < 

i,j / \ 2=1 

(39) <k^^-^E{Y,{Xn,i- 

\ i=i 

= k‘^PE{Xn - CnfP. 

Applying (38) and (39) in (37) we obtain for all p > 1, with Cp = 2PBPk‘^P, 

(40) E\Zn-EZn\P<CpEiXn-CnfP. 

It therefore suffices to prove (34) to demonstrate the proposition. 

In Lemma 8 of [13], it is shown that when F: [a, b]^ —> [a, b] is an averaging 
function and there exists c £ [a, b] such that c, then 




(41) 


for every e £ (0,1) there exists M such that, for all n, 
P(| - c| > e) < MF. 


Hence the large deviation estimate (41) holds under the given assumptions, 
and so also with Cn replacing c when Cn —> c. Since Xn £ [a, b] and Xn ^ c, 
Cn = EXn ^ c by the bounded convergence theorem. 

We now show that if an, n = 0,1,..., is a sequence such that for every 
e > 0 there exists M such that, for all n > no, 

(42) Un+i < (A + e)^an + M(A + 


then for all e > 0 there exists C such that 


(43) 


an<C{X + £)P'^ for all n. 


Let e > 0 be given, and let M and no be such that (42) holds with e replaced 
by e/2. Setting 


p= 1 - 


A + 


eJiy 


A 4“ e^ / 


and C = max 


^no 


M 

(A + e)”o ’ p 


A + e/21P(”o+i) 


A 4“ £ 


it is trivial that (43) holds for n = hq. Since the second quantity in the 
maximum decreases when no is replaced by n > no, induction shows (43) 
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holds for all n > no- By increasing C if necessary, we have that (43) holds 
for all n. 

Unqualified statements in the remainder of the proof below involving e 
and M are to be read to mean that for every e > 0 there exists M, not 
necessarily the same at each occurrence, such that the statement holds for 
all n. By (41), 

E{Xn - CnfP = E[{Xn - \Xn - Cn\ < s] 

+ E[{Xn - CnfP] \Xn - C„| > e] 

<ePE\Xn-Cn\P + Me^, 

so from (40), 

(44) E\Zn-EZn\P<eE\Xn-Cn\P + Me^. 

Since for all w,z^ 


\w + z\P <{l + e)\w\P+ M\z\P, 


definition (32) yields 


(45) E\Xn+i-Cn+i\^<{l + e)E 


^ ^ 0^n,i{Xn Cn) 


2=1 


+ ME\Zn-EZnf. 


Specializing (45) to the case p = 2 gives, for all n sufficiently large, 
E{Xn+l - Cn+lf < (A + efE{Xn - Cnf + ME{Zn - EZ^f. 


Applying (44) with p = 2 to this inequality yields, for all n sufficiently large, 

E{Xn+l - Cn+lf < (A + efE{Xn - Cnf + 

< (A + efE{Xn - Cnf + Mix + e)2("+B . 

Hence, with p = 2, (42) and therefore (43) are true for = i?(A„ — Cnf, 
yielding (34) for p = 2. Now apply Holder’s inequality to prove the case 

p = 1. 

Assume now that (34) is true for all 2 < g < p in order to induct on p. 
Expand the first term in (45), letting p = (pi,... ,pk) and |p| = J2iPi- Use 
the induction hypotheses, and Proposition 2.1 in (46), to obtain for all n 
sufficiently large, with Ax,p = maXg^pCx,q and p, 


E 


k 

^ ^ (-^n 

2 = 1 


Cn) 


P 


k 

2=1 
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<£|X„-c„|>’X;|a„T+ E (p)ni“«T'CJ‘„(A + er” 

i=l \p\=P,Pi<P 

k k 

(46) <E\Xn-Cn\PY.M + ^X,pi>^ + ^r E (Dl[\^n,ir 

i=l 

k / k \ P 

= E\Xn - Cn\^ EE \»n,i\ 

i=l \i=l / 

k 

< E \a^A^{E\Xn - 

i=l 

< (A + e)P£;|X„ - Cn\P + 

Applying (44) and (46) to (45) gives 

E\Xn+l - Cn+l\P < (A + E)PE\Xn - Cn\P + M(A + 

from which we can conclude (43) for a„ = E\Xn — Cn\P, completing the in¬ 
duction on p. We conclude (34) holds for all p > 1. □ 

Proof of Theorem 1.1. By replacing X„ and T(x) by Xn/T(lfc)"' 
and F(x)/F(lfc), respectively, we may assume F is averaging. By property 1 
of averaging functions, F{c) = c, and differentiation yields = 1- By 

property 2, monotonicity, a* > 0, and (17) of Proposition 2.1 yields 0 < A < 

< 1 . 

Inspection of (22) shows that, for any rj £ (A,l), there exists (5i and 5^ 
in (0,1) and ^4 in (5i,l — A) yielding ry. For example, to achieve values 
arbitrarily close to A from above, take (5i and ^3 close to zero and 5^ close 
to 1 — A from below. Set 62 = S 4 . By Theorem 3.1 it suffices to show that 
Conditions 3.1 and 3.2 are satisfied for these choices of 6 . 

Since ^4 < 1 — A we have A^ < A(1 — 5i)] hence we may pick e > 0 such 
that (A -|- e)^ < A(1 — <^ 4 ). By Proposition 4.1, for p = 2 and p = 4, for this e 
there exists C^p such that 

F{Zn - FZnY < C^z^piX + efp^ < C'|pA'^"(l - <54)'"’". 

Hence the fourth and second moment bounds on are satisfied with ^4 
and (^2 = <^ 4 , respectively. 

Proposition 10 of [13] shows that under the assumptions of Theorem 1.1, 
for every e > 0 there exists C\ 2 such that 

Var(A„)>CE(A-£)"". 

Taking e = A(5i, we have Var(A„) satisfies its lower bound condition. Lastly, 
applying Proposition 4.1 with p = 4 and e = we see the fourth moment 
bound on X^ is satisfied, and the proof is complete. □ 
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5. Convergence rates for the diamond lattice. We now apply Theorem 
1.1 to hierarchical sequences generated by the diamond lattice conductivity 
function F in (2), for various choice of positive weights satisfying T(l 4 ) = 1. 
For all such F(x) the result of Shneiberg [ 8 ] quoted in Section 1 shows that 
Xn satishes a strong law if Xq £ [0,1], say. The first partials of F have the 
form, for example, 

9F(x) _ 

dxi {{wiXl)~^ + {W2X2)~^)‘^' 

and therefore F'(cnl 4 ) does not depend on Cn- In particular, for all n. 


CTt, — 


W. 




W2 ^ 


w 


-1 






from which 

(47) 

where 


ip = X-^ 






As an illustration, define the “side equally weighted network” to be the 
one with w = {w,w,2 — w,2 — w)'^ for w £ [ 1 , 2 ); such weights are posi¬ 
tive and satisfy F(l 4 ) = 1. For w = 1 all weights are equal, and we have 
ot = 4 “^l 4 , and hence (f achieves its minimum value 1/2 = 1/^/k with fc = 4. 
By Theorem 1.1, for all 7 £ (0,1/2) there exists a constant C such that 
d{Wn,M) < Cq", with 7 close to 1/2 corresponding to the rate 
for small e > 0 and N = 4F, the number of variables at stage n. As w in¬ 
creases from 1 to 2 , (/? increases continuously from 1/2 to l/\/ 2 , with w close 
to 2 corresponding to the least favorable rate for the side equally weighted 
network of for any e > 0 . 

With only the restriction that the weights are positive and satisfy ^( 14 ) = 
1 consider 


w = {l + l/t,s,t,l/tV, 

where s = [(1 — ( 1 /t -|- “ (1 + 1 /^)”^]”^, t > 0 . 

When t = 1 we have s = 2/3 and (p = ll\/2/27. As t —> 00 , s/t —> 1/2 and 
OL tends to the standard basis vector (1,0,0,0), so y?—> 1. Since ll\/2/27 < 
l/\/2, the above two examples show that the value of 7 given by Theo¬ 
rem 1.1 for the diamond lattice can take any value in the range ( 1 / 2 , 1 ), 
corresponding to N~^ for any 9 £ ( 0 , 1 / 2 ). 
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6. Composition of strict averaging functions. In this section, we prove 
Theorem 1.2, which shows when the composition of strictly averaging func¬ 
tions is again strictly averaging. 


Proof of Theorem 1.2. We first show T’s(x) satisfies the strict form 
of property 1. If x = tlk, then Fs{x) = Fo(sit, • ■ •, Sfct) = Tb(s)t = t and prop¬ 
erty 1 is satisfied in this case. Hence assume minjXj = x <y = maxjXj. For 
such X, if there is a t such that Ti(xj) = t for all z = 1,..., A:, then for some 
i and j we have y = Xj, j G li, and hence x < Fi{xi) = t since Fi is strictly 
averaging, and similarly, t <y. Hence x < Ti(x) = t <y. 

For X such that for all i G Iq, SiFi{xi) = t for some t, we have 

Fs(x) =Fb(siFi(xi),...,SfcFfc(xfc)) =Fo{tlk) =t. 

For s = Ifc we have just shown the strict inequality x <t < y holds. Oth¬ 
erwise s 7 ^ Ifc and by Fq{s) = 1 we have min* s* < 1 < maxjSj, and since 
t = Fi{xi)/Si for all i there exist ii and Z 2 such that 

X < Fii(xji) < t < Fi^{xi^) < y, 

yielding again the required strict inequality. 

For X such that there are zi,Z 2 such that sqFq(xq) / Si^Fi^ixi^), we 
have SjFj{xj) < maxjSiFi(xj) for some j. Since Fq is strictly monotone and 
homogeneous. 


Fs(x) = Fo{siFi{xi ),... ,SfcFfc(xfc)) < Fq maxFj(xi),...,maxFi(xi)^ 

= maxFj(xi)Fo(s) = maxFj(xi) < y. 


The argument for the minimum is the same; hence Fs(x) satisfies the strict 
form of property 1. 

Since the composition of strictly monotone increasing functions is strictly 
monotone, the strict form of property 2 is satisfied for Fs(x). 

The claim for Fi(x) now follows by setting 


Gi{x) 


Fi{xi) _ Fi(l|7^|)Fo(lfc) 

for z = 0,1,..., A:, 


so that 
Fi{^) 
F’i(lfc) 


Fo{Fi{xi),...,Fk(xk)) 

Fo(-A^i(1|/i|), • • • ,^fc(l|4|)) 


Go('SiGi(xi),..., SfcGfc(xfc)), 


where Gj(xj) is strictly averaging with Gq homogeneous, and Go(s) = 1. □ 
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